When looking into the UMAPs for scRNAseq data using all genes versus using only the metabolic genes, there is a marked difference in the capacity to separate the cells into their respective cell-types. In fact, only the proliferative T-cells are clearly separated from the rest when using only metabolic genes, which are very well known for their drastic change in metabolism to adjust to their proliferative demands.
Regarding the pseudo-bulk data, which was used to create a cell-type (if present) model for each sample in each individual, we can still see the same separation of the proliferative T-cells. Furthermore, other cell-types seem to be grouping together.
Next, machine learning with a random forest classifier was performed using either pseudo-bulk data or reaction presence data (which represents the reactions that were considered present when constructing the cell-type specific models).
The evaluation metric used to assess the prediction capability was the Mathews correlation coefficient (MCC), as it is appropriate to classify multi-class problems, perfectly symmetric (no class is more important than the other), and not sensitive to class-imbalance (i.e., when the different classes are not evenly represented, which happens in this data, with proliferative T-cells having far less models than other cell-types). The general MCC formula is the following:
\[ MCC = \frac{TP * TN - FP * FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} \]
The MCC metric varies from -1 to 1. An MCC value 1 means that all cell-types were correctly classified, while a value of -1 means that all cell-types were not well classified. An MCC value of 0 means that the classifier is no better than random guessing.
The dataset that leads to the best results is the pseudo-bulk data, with an MCC greater than 0.75. Prediction capacity slightly decreases when using reaction presence data, with an MCC of around 0.56. This shows that the models do a good job at representing the cell-types at the level of what reactions should be included or not to accurately represent the transcriptomics data.
[explanation]
[analyse results]
As expected, proliferative CD4 and CD8 T-cells seem to have a higher biomass flux than their naive counterparts. Regarding the remaining cell-types, each seems to have a varied biomass flux across the different models, with the exception of cytotoxic CD8 and IL17+ CD4 T-cells, which have relatively smaller biomass fluxes.
Naive CD8 T-cells have markedly more ATP production than their proliferative counterparts. However, the same thing for naive vs proliferative CD4 T-cells is not clear.
When comparing biomass and ATP production, ATP production is clearly bigger in IL17+ CD4 T-cells and naive CD8 T-cells. In proliferative CD4 and CD8 T-cells, the flux going through the biomass reaction is clearly bigger than their ATP production.
In general, all cell-types resort obtain most of their FADH2 and NADH from Fatty Acid Oxidation (FAO), which is expected according to the literature.